-
Notifications
You must be signed in to change notification settings - Fork 22
mirror the args and test_cfg paths for worktree #792
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
PR Reviewer Guide 🔍Here are some key observations to aid the review process:
|
PR Code Suggestions ✨Explore these optional code suggestions:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how brittle this is, it already broke once, we should have unit tests for the core parts of this.
| relative_path = path.relative_to(src_root) | ||
| return dest_root / relative_path |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
⚡️Codeflash found 41% (0.41x) speedup for mirror_path in codeflash/optimization/optimizer.py
⏱️ Runtime : 14.2 milliseconds → 10.1 milliseconds (best of 90 runs)
📝 Explanation and details
The optimization replaces Path.relative_to() with direct manipulation of path components using .parts, resulting in a 40% speedup.
Key optimizations:
-
Eliminated expensive
relative_to()method: The original code callspath.relative_to(src_root), which internally creates a new Path object and performs complex validation logic. The optimized version directly accesses.partstuples and uses simple tuple slicing. -
Direct tuple comparison: Instead of Path's internal prefix checking, the code compares
path_parts[:len(src_parts)] != src_parts- a fast tuple slice comparison that avoids object creation. -
Efficient path reconstruction: Uses
joinpath(*relative_parts)with tuple unpacking rather than the/operator on Path objects, reducing intermediate Path object allocations.
Performance characteristics:
- Best gains (40-98% faster): Cases with simple path structures, root mirroring, and paths with many components benefit most from avoiding
relative_to()'s overhead - Slight slowdowns (6-29% slower): Very deeply nested paths (100+ levels) or complex relative paths where tuple operations become less efficient than Path's optimized internal handling
- Consistent improvements: Most real-world use cases see 10-50% speedups, especially beneficial for batch operations processing thousands of files
The optimization maintains identical behavior including proper ValueError handling for invalid paths, making it a drop-in performance improvement.
✅ Correctness verification report:
| Test | Status |
|---|---|
| ⚙️ Existing Unit Tests | 🔘 None Found |
| 🌀 Generated Regression Tests | ✅ 4037 Passed |
| ⏪ Replay Tests | 🔘 None Found |
| 🔎 Concolic Coverage Tests | 🔘 None Found |
| 📊 Tests Coverage | 100.0% |
🌀 Generated Regression Tests and Runtime
from __future__ import annotations
from pathlib import Path
# imports
import pytest # used for our unit tests
from codeflash.optimization.optimizer import mirror_path
# unit tests
# ----------------------
# 1. Basic Test Cases
# ----------------------
def test_basic_file_mirroring():
# Simple file mirroring
src_root = Path("/home/user/source")
dest_root = Path("/mnt/backup/dest")
file_path = src_root / "docs/readme.txt"
codeflash_output = mirror_path(file_path, src_root, dest_root); mirrored = codeflash_output # 8.16μs -> 7.16μs (13.9% faster)
def test_basic_directory_mirroring():
# Directory mirroring
src_root = Path("/data/src")
dest_root = Path("/data/dest")
dir_path = src_root / "images"
codeflash_output = mirror_path(dir_path, src_root, dest_root); mirrored = codeflash_output # 7.43μs -> 5.65μs (31.5% faster)
def test_basic_root_mirroring():
# Mirroring the root itself
src_root = Path("/srcroot")
dest_root = Path("/destroot")
codeflash_output = mirror_path(src_root, src_root, dest_root); mirrored = codeflash_output # 6.97μs -> 4.90μs (42.1% faster)
def test_basic_nested_mirroring():
# Nested file path
src_root = Path("/a/b")
dest_root = Path("/x/y")
file_path = src_root / "c/d/e.txt"
codeflash_output = mirror_path(file_path, src_root, dest_root); mirrored = codeflash_output # 7.81μs -> 7.13μs (9.51% faster)
# ----------------------
# 2. Edge Test Cases
# ----------------------
def test_path_not_under_src_root_raises():
# Path not under src_root should raise ValueError
src_root = Path("/home/user/source")
dest_root = Path("/mnt/backup/dest")
file_path = Path("/home/user/otherdir/file.txt")
with pytest.raises(ValueError):
mirror_path(file_path, src_root, dest_root) # 12.4μs -> 11.5μs (8.24% faster)
def test_src_root_is_relative():
# Relative src_root
src_root = Path("relative/src")
dest_root = Path("relative/dest")
file_path = src_root / "file.txt"
codeflash_output = mirror_path(file_path, src_root, dest_root); mirrored = codeflash_output # 5.86μs -> 5.40μs (8.45% faster)
def test_dest_root_is_relative():
# Relative dest_root
src_root = Path("/absolute/src")
dest_root = Path("relative/dest")
file_path = src_root / "file.txt"
codeflash_output = mirror_path(file_path, src_root, dest_root); mirrored = codeflash_output # 6.90μs -> 5.29μs (30.5% faster)
def test_path_is_relative():
# Relative path
src_root = Path("src")
dest_root = Path("dest")
file_path = src_root / "subdir/file.txt"
codeflash_output = mirror_path(file_path, src_root, dest_root); mirrored = codeflash_output # 5.75μs -> 6.16μs (6.55% slower)
def test_src_root_trailing_slash():
# src_root with trailing slash
src_root = Path("/foo/bar/")
dest_root = Path("/baz/qux")
file_path = Path("/foo/bar/file.txt")
codeflash_output = mirror_path(file_path, src_root, dest_root); mirrored = codeflash_output # 7.53μs -> 5.77μs (30.4% faster)
def test_path_is_symlink():
# Path is a symlink (just as a string, not checking filesystem)
src_root = Path("/src")
dest_root = Path("/dest")
file_path = src_root / "link"
codeflash_output = mirror_path(file_path, src_root, dest_root); mirrored = codeflash_output # 6.63μs -> 5.12μs (29.4% faster)
def test_src_root_is_dot():
# src_root is "."
src_root = Path(".")
dest_root = Path("/backup")
file_path = src_root / "foo/bar.txt"
codeflash_output = mirror_path(file_path, src_root, dest_root); mirrored = codeflash_output # 5.16μs -> 5.97μs (13.6% slower)
def test_path_equals_src_root():
# path == src_root
src_root = Path("/abc/def")
dest_root = Path("/xyz/uvw")
codeflash_output = mirror_path(src_root, src_root, dest_root); mirrored = codeflash_output # 7.09μs -> 4.51μs (57.0% faster)
def test_path_is_empty_relative_to_src_root():
# path is src_root, so relative path is empty
src_root = Path("/a")
dest_root = Path("/b")
codeflash_output = mirror_path(src_root, src_root, dest_root); mirrored = codeflash_output # 6.66μs -> 4.42μs (50.8% faster)
def test_path_is_dot_relative_to_src_root():
# path is "./", src_root is "."
src_root = Path(".")
dest_root = Path("/backup")
file_path = Path(".")
codeflash_output = mirror_path(file_path, src_root, dest_root); mirrored = codeflash_output # 4.39μs -> 4.76μs (7.76% slower)
def test_path_is_subdir_of_src_root():
# path is an immediate subdir of src_root
src_root = Path("/data")
dest_root = Path("/mirror")
file_path = src_root / "subdir"
codeflash_output = mirror_path(file_path, src_root, dest_root); mirrored = codeflash_output # 6.28μs -> 5.26μs (19.4% faster)
def test_path_has_multiple_slashes():
# path with multiple slashes
src_root = Path("/foo//bar")
dest_root = Path("/baz")
file_path = Path("/foo/bar//file.txt")
codeflash_output = mirror_path(file_path, src_root, dest_root); mirrored = codeflash_output # 7.30μs -> 5.82μs (25.5% faster)
def test_path_is_not_subpath_of_src_root():
# path is not a subpath of src_root (should raise)
src_root = Path("/foo/bar")
dest_root = Path("/baz")
file_path = Path("/foo/bar2/file.txt")
with pytest.raises(ValueError):
mirror_path(file_path, src_root, dest_root) # 12.1μs -> 11.4μs (6.06% faster)
def test_path_is_empty_string():
# path is empty string (should raise)
src_root = Path("/foo")
dest_root = Path("/bar")
file_path = Path("")
with pytest.raises(ValueError):
mirror_path(file_path, src_root, dest_root) # 10.3μs -> 9.93μs (4.16% faster)
def test_src_root_is_empty_string():
# src_root is empty string
src_root = Path("")
dest_root = Path("/bar")
file_path = Path("foo.txt")
codeflash_output = mirror_path(file_path, src_root, dest_root); mirrored = codeflash_output # 5.47μs -> 5.80μs (5.54% slower)
def test_dest_root_is_empty_string():
# dest_root is empty string
src_root = Path("/foo")
dest_root = Path("")
file_path = src_root / "bar.txt"
codeflash_output = mirror_path(file_path, src_root, dest_root); mirrored = codeflash_output # 6.94μs -> 5.64μs (22.9% faster)
# ----------------------
# 3. Large Scale Test Cases
# ----------------------
def test_large_number_of_files():
# Test mirroring for a large number of files
src_root = Path("/src")
dest_root = Path("/dest")
for i in range(1000):
file_path = src_root / f"folder_{i}/file_{i}.txt"
codeflash_output = mirror_path(file_path, src_root, dest_root); mirrored = codeflash_output # 3.72ms -> 2.62ms (42.1% faster)
def test_deeply_nested_path():
# Test with a deeply nested path
src_root = Path("/deep")
dest_root = Path("/mirror")
nested = "a/" * 100 + "file.txt"
file_path = src_root / nested
codeflash_output = mirror_path(file_path, src_root, dest_root); mirrored = codeflash_output # 30.1μs -> 39.3μs (23.3% slower)
def test_large_src_and_dest_roots():
# Large src_root and dest_root
src_root = Path("/" + "/".join([f"src{i}" for i in range(20)]))
dest_root = Path("/" + "/".join([f"dest{i}" for i in range(20)]))
file_path = src_root / "file.txt"
codeflash_output = mirror_path(file_path, src_root, dest_root); mirrored = codeflash_output # 11.2μs -> 5.66μs (97.9% faster)
def test_large_file_name():
# File with a very long name
src_root = Path("/src")
dest_root = Path("/dest")
long_name = "a" * 255 + ".txt"
file_path = src_root / long_name
codeflash_output = mirror_path(file_path, src_root, dest_root); mirrored = codeflash_output # 6.11μs -> 4.96μs (23.2% faster)
def test_large_number_of_subdirs():
# Path with many subdirectories
src_root = Path("/src")
dest_root = Path("/dest")
subdirs = "/".join([f"subdir{i}" for i in range(50)])
file_path = src_root / subdirs / "file.txt"
codeflash_output = mirror_path(file_path, src_root, dest_root); mirrored = codeflash_output # 17.3μs -> 21.9μs (20.9% slower)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
from __future__ import annotations
from pathlib import Path
# imports
import pytest # used for our unit tests
from codeflash.optimization.optimizer import mirror_path
# unit tests
# ----------------
# BASIC TEST CASES
# ----------------
def test_basic_file_mirroring():
# Simple file mirroring
src = Path("/src_root/dir/file.txt")
src_root = Path("/src_root")
dest_root = Path("/dest_root")
expected = Path("/dest_root/dir/file.txt")
codeflash_output = mirror_path(src, src_root, dest_root) # 6.61μs -> 6.16μs (7.30% faster)
def test_basic_directory_mirroring():
# Simple directory mirroring
src = Path("/src_root/dir/subdir")
src_root = Path("/src_root")
dest_root = Path("/dest_root")
expected = Path("/dest_root/dir/subdir")
codeflash_output = mirror_path(src, src_root, dest_root) # 7.17μs -> 6.46μs (11.0% faster)
def test_basic_root_mirroring():
# Mirroring the root itself
src = Path("/src_root")
src_root = Path("/src_root")
dest_root = Path("/dest_root")
expected = Path("/dest_root")
codeflash_output = mirror_path(src, src_root, dest_root) # 6.42μs -> 5.09μs (26.2% faster)
def test_basic_nested_file_mirroring():
# Mirroring a deeply nested file
src = Path("/src_root/a/b/c/d/e.txt")
src_root = Path("/src_root")
dest_root = Path("/dest_root")
expected = Path("/dest_root/a/b/c/d/e.txt")
codeflash_output = mirror_path(src, src_root, dest_root) # 8.15μs -> 7.86μs (3.64% faster)
# ----------------
# EDGE TEST CASES
# ----------------
def test_path_not_under_src_root():
# path is not under src_root (should raise ValueError)
src = Path("/other_root/file.txt")
src_root = Path("/src_root")
dest_root = Path("/dest_root")
with pytest.raises(ValueError):
mirror_path(src, src_root, dest_root) # 11.9μs -> 11.6μs (2.48% faster)
def test_src_and_dest_are_same():
# src_root and dest_root are the same
src = Path("/root/dir/file.txt")
src_root = Path("/root")
dest_root = Path("/root")
expected = Path("/root/dir/file.txt")
codeflash_output = mirror_path(src, src_root, dest_root) # 8.34μs -> 7.06μs (18.1% faster)
def test_relative_paths():
# Using relative paths
src = Path("src_root/dir/file.txt")
src_root = Path("src_root")
dest_root = Path("dest_root")
expected = Path("dest_root/dir/file.txt")
codeflash_output = mirror_path(src, src_root, dest_root) # 6.54μs -> 6.29μs (4.04% faster)
def test_dot_paths():
# Using '.' and '..' in paths
src = Path("/src_root/dir/../dir2/file.txt")
src_root = Path("/src_root")
dest_root = Path("/dest_root")
# Pathlib resolves '..' automatically
expected = Path("/dest_root/dir2/file.txt")
codeflash_output = mirror_path(src.resolve(), src_root, dest_root) # 8.02μs -> 6.59μs (21.5% faster)
def test_trailing_slash():
# src_root and dest_root with trailing slashes
src = Path("/src_root/dir/file.txt")
src_root = Path("/src_root/")
dest_root = Path("/dest_root/")
expected = Path("/dest_root/dir/file.txt")
codeflash_output = mirror_path(src, src_root, dest_root) # 7.50μs -> 6.32μs (18.8% faster)
def test_empty_relative_path():
# Path is exactly src_root, so relative_path is empty
src = Path("/src_root")
src_root = Path("/src_root")
dest_root = Path("/dest_root")
expected = Path("/dest_root")
codeflash_output = mirror_path(src, src_root, dest_root) # 6.77μs -> 5.24μs (29.3% faster)
def test_case_sensitivity():
# Case sensitivity (should fail on case mismatch)
src = Path("/SRC_ROOT/dir/file.txt")
src_root = Path("/src_root")
dest_root = Path("/dest_root")
with pytest.raises(ValueError):
mirror_path(src, src_root, dest_root) # 12.5μs -> 11.8μs (6.17% faster)
def test_symlink_path():
# Symlinked path (should work as long as path is under src_root)
src = Path("/src_root/dir/link_to_file.txt")
src_root = Path("/src_root")
dest_root = Path("/dest_root")
expected = Path("/dest_root/dir/link_to_file.txt")
codeflash_output = mirror_path(src, src_root, dest_root) # 8.68μs -> 7.58μs (14.6% faster)
# ----------------------
# LARGE SCALE TEST CASES
# ----------------------
def test_large_number_of_files():
# Large number of files in a flat directory
src_root = Path("/src_root")
dest_root = Path("/dest_root")
for i in range(1000):
src = src_root / f"file_{i}.txt"
expected = dest_root / f"file_{i}.txt"
codeflash_output = mirror_path(src, src_root, dest_root) # 3.51ms -> 2.28ms (53.7% faster)
def test_large_nested_structure():
# Large nested directory structure
src_root = Path("/src_root")
dest_root = Path("/dest_root")
# Create a path with 1000 nested directories
nested_dirs = [f"dir_{i}" for i in range(1000)]
src = src_root.joinpath(*nested_dirs, "file.txt")
expected = dest_root.joinpath(*nested_dirs, "file.txt")
codeflash_output = mirror_path(src, src_root, dest_root) # 211μs -> 299μs (29.4% slower)
def test_large_relative_paths():
# Large number of relative paths
src_root = Path("src_root")
dest_root = Path("dest_root")
for i in range(1000):
src = Path("src_root") / f"file_{i}.txt"
expected = Path("dest_root") / f"file_{i}.txt"
codeflash_output = mirror_path(src, src_root, dest_root) # 2.92ms -> 2.30ms (27.3% faster)
def test_large_mixed_paths():
# Mix of absolute and relative paths
src_root = Path("/src_root")
dest_root = Path("dest_root")
for i in range(1000):
src = Path("/src_root") / f"file_{i}.txt"
expected = Path("dest_root") / f"file_{i}.txt"
codeflash_output = mirror_path(src, src_root, dest_root) # 3.51ms -> 2.31ms (51.9% faster)To test or edit this optimization locally git merge codeflash/optimize-pr792-2025-10-04T20.20.00
| relative_path = path.relative_to(src_root) | |
| return dest_root / relative_path | |
| path_parts = path.parts | |
| src_parts = src_root.parts | |
| if path_parts[: len(src_parts)] != src_parts: | |
| raise ValueError(f"{path!r} does not start with {src_root!r}") | |
| relative_parts = path_parts[len(src_parts) :] | |
| return dest_root.joinpath(*relative_parts) |
User description
here is a diff between the args in normal mode and worktree mode for a project that has nested dir for the module root (src/app):
https://www.diffchecker.com/6mFEWvIz/
PR Type
Bug fix, Enhancement
Description
Align worktree args with normal mode
Remove noisy LSP info logs
Compute project roots without worktree flag
Mirror tests/benchmarks paths in worktree
Diagram Walkthrough
File Walkthrough
cli.py
Simplify project root resolution logiccodeflash/cli_cmds/cli.py
beta.py
Reduce LSP verbosity and clarify worktree notecodeflash/lsp/beta.py
optimizer.py
Exact mirroring of args for worktree modecodeflash/optimization/optimizer.py